Limited bandwidth to affect processor design

نویسندگان

  • Doug Burger
  • James R. Goodman
  • Alain Kägi
چکیده

T he phenomenal improvements in microprocessor performance place significant demands on memory systems , requiring a low latency, high-band-width stream of operands. Researchers point out that DRAM access latencies (measured in processor cycles) are growing and any request that misses in the caches may eventually take hundreds of cycles to satisfy. These researchers have proposed many techniques to mitigate the penalties of long memory latencies, such as lockup-free caches, cache-conscious load scheduling, hardware and software prefetching, stream buffers, speculative loads and execution, multithreading, data value prediction, and instruction reuse. Most of these techniques, while reducing the impact of contentionless access laten-cies, do so at the cost of increasing a pro-gram's bandwidth requirements. These latency tolerance (or reduction) techniques may increase a processor's memory band-width needs by causing the processor to request the same stream of operands in less time, or by causing the processor to request more data from memory. In turn, these techniques may cause the processor to stall due to queueing in the memory system. (We differentiate between " intrinsic latency " of a contentionless access and latency added by queueing in the memory system, which results from limited bandwidth.) Neither the long latencies nor the increased bandwidth requirements constitute a " memory wall " that will eventually inhibit improved microprocessor performance. Instead, designers will employ a range of design decisions and new technologies to produce balanced, cost-effective systems. The extent to which long latency and bandwidth requirements affect performance will determine which techniques or technologies are affordable, and/or worth the effort of implementing. Since some of the solutions trade improved latency for increased traffic, or higher bandwidth for increased latency, the relative effects that these two components of memory accesses have on performance is important. Here, we quantify and compare the performance impacts of memory latencies and finite bandwidth. We show that the implementation of aggressive latency tolerance techniques aggravates stalls due to finite memory bandwidth, which actually become more significant than stalls resulting from uncongested memory latency alone. We expect that memory bandwidth limitations across the processor pins will drive significant architectural change, for the following reasons: • Continuing progress in processor design will increase the issue rate of instructions. These advances include both architectural innovation (wider issue, speculative execution, and so forth) and circuit advances (faster, denser logic). • To the extent that latency tolerance techniques are successful, they will speed up the retirement rate …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bandwidth and Delay Optimization by Integrating of Software Trust Estimator with Multi-User Cloud Resource Competence

Trust Establishment is one of the significant resources to enhance the scalability and reliability of resources in the cloud environment. To establish a novel trust model on SaaS (Software as a Service) cloud resources and to optimize the resource utilization of multiple user requests, an integrated software trust estimator with multi-user resource competence (IST-MRC) optimization mechanism is...

متن کامل

Designing a Scalable Processor Array for Recurrent Computations

In this paper, we study the design of a coprocessor (CoP) to execute efficiently recursive algorithms with uniform dependencies. Our design is based on two objectives: 1) fixed bandwidth to main memory (MM) and 2) scalability to higher performance without increasing MM bandwidth. Our CoP has an access unit (AU) organized as multiple queues, a processor array (PA) with regularly connected proces...

متن کامل

Design and Bandwidth Analysis of Fault-Tolerant Multistage Interconnection Networks

The design of a suitable interconnection network for inter-processor communication is one of the key issues of the system performance. In this study a new irregular interconnection network IABN (Irregular Augmented Baseline) has been proposed. IABN is designed by modifying existing ABN (Augmented Baseline Network). ABN is a regular multi-path network with limited fault tolerance. IABN provides ...

متن کامل

Analysis of the Web, Processor Soeed and Bandwidth Growth: Impact on Search Engine Design

World Wide Web (web) is perceived as an unstructured and uncontrolled system. This paper explores the web growth, processor speedup, and bandwidth growth over the time and proves that the web in its entirety is not an uncontrolled system that explodes indefinitely. Critical observation and analysis of the web, processor speedup, and bandwidth growth, prompted us to conjecture that “the Internet...

متن کامل

Limits and Opportunities for Designing Manycore Processor-to-Memory Networks using Monolithic Silicon Photonics

To sustain the historic performance improvement in VLSI systems, while remaining within the power envelope, the trend has moved towards designing multiple cores on a single die. However, if designed using current and/or projected electrical solutions, these systems would quickly get bandwidth-limited due to bandwith density limitations and power constraints. It is therefore necessary to explore...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Micro

دوره 17  شماره 

صفحات  -

تاریخ انتشار 1997